Deep Feature-Based Text Clustering and its Explanation

نویسندگان

چکیده

Text clustering is a critical step in text data analysis and has been extensively studied by the mining community. Most existing algorithms are based on bag-of-words model, which faces high-dimensional sparsity problems ignores structural sequence information. Deep learning-based models such as convolutional neural networks recurrent regard texts sequences but lack supervised signals explainable results. In this paper, we propose d eep xmlns:xlink="http://www.w3.org/1999/xlink">f eature-based xmlns:xlink="http://www.w3.org/1999/xlink">t ext xmlns:xlink="http://www.w3.org/1999/xlink">c lustering ( DFTC ) framework that incorporates pretrained encoders into tasks. This representations, breaks dependency supervision. The experimental results show our model outperforms classic state-of-the-art language i.e., BERT, almost all considered datasets. addition, explanation of significant for understanding principles deep learning approach. Our proposed includes an module can help users understand meaning quality

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Explanation-Based Feature Construction

Choosing good features to represent objects can be crucial to the success of supervised machine learning algorithms. Good high-level features are those that concentrate information about the classification task. Such features can often be constructed as non-linear combinations of raw or native input features such as the pixels of an image. Using many nonlinear combinations, as do SVMs, can dilu...

متن کامل

Deep Web Classification based on Domain Feature Text

Deep web provides tremendous structured data with high quality. In order to retrieve deep web data, one important task is to classify the domains of deep web automatically. In this paper, an approach based on domain feature text (DFT) is presented to classify the deep web. In the phase of DFT selection, a semantic abstract method based on ontology knowledge and a quantitative criteria for DFT s...

متن کامل

Text feature extraction based on deep learning: a review

Selection of text feature item is a basic and important matter for text mining and information retrieval. Traditional methods of feature extraction require handcrafted features. To hand-design, an effective feature is a lengthy process, but aiming at new applications, deep learning enables to acquire new effective feature representation from training data. As a new feature extraction method, de...

متن کامل

Deep CNN based feature extractor for text-prompted speaker recognition

Deep learning is still not a very common tool in speaker verification field. We study deep convolutional neural network performance in the text-prompted speaker verification task. The prompted passphrase is segmented into word states — i.e. digits — to test each digit utterance separately. We train a single high-level feature extractor for all states and use cosine similarity metric for scoring...

متن کامل

The Feature Selection Method based on Genetic Algorithm for Efficient of Text Clustering and Text Classification

Big Data means a very large amount of data and includes a range of methodologies such as big data collection, processing, storage, management, and analysis. Since Big Data Text Mining extracts a lot of features and data, clustering and classification can result in high computational complexity and the low reliability of the analysis results. In particular, a TDM (Term Document Matrix) obtained ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Knowledge and Data Engineering

سال: 2022

ISSN: ['1558-2191', '1041-4347', '2326-3865']

DOI: https://doi.org/10.1109/tkde.2020.3028943